Memory with addressable subword support

ABSTRACT

Integrated circuits are provided that have memory arrays. The memory arrays may include rows and columns of data byte storage locations. To implement algorithms that that process data subwords, a memory array may be partitioned into individual memory banks each of which has its own associated data register and its own associated address decoder. Each address decoder may receive address signals from an associated multiplexer. Address mapping circuits may be used to distribute address signals to multiplexer inputs using an non-blocking memory architecture. The memory architecture allows collections of data bytes to be written and read from the memory array using column-wise and row-wise read and write operations. The data bytes that are written to the array and that are read from the array may be stored in adjacent data byte locations in the array.

BACKGROUND

This invention relates to integrated circuits, and more particularly, tointegrated circuits with memory that is used in processing subwords ofdata.

Memory is widely used in the integrated circuit industry. Memory arraysare formed as part of integrated circuits such as application specificintegrated circuits, programmable logic device integrated circuits,digital signal processors, microprocessors, microcontrollers, and memorychips.

Memory arrays often handle data in the form of relatively large datawords. For example, data may be read from and written to memory arraysin 32-bit words. Words of this bit length are used to improve efficiencyand reduce circuit overhead.

In arrangements in which data is handled in large words, each data wordmay contain multiple bytes of data. For example, a 32-bit word maycontain four eight-bit bytes of data. The data bytes in the data wordmay sometimes be referred to as subwords.

Many modern data processing algorithms involve the manipulation ofsubwords of data. For example, it may be necessary to store and retrievesubwords of image data in a memory array when performing imagecompression. As another example, wireless communications standards suchas the emerging 4G wireless communications standards may require theprocessing of individual subwords. With processing algorithms such asthese, it may be desired, for example, to write subwords into a memoryarray in a column-wise fashion and to read subwords from the same memoryarray in a row-wise fashion. Operations such as these can be cumbersomein conventional memory arrays, because they require numerous full-wordread and write operations and data manipulations such as data shiftingand combining operations.

It would therefore be desirable to be able to provide improved memorycircuits for handling subword processing operations on integratedcircuits.

SUMMARY

In accordance with the present invention, integrated circuits areprovided with memory circuitry. The integrated circuits may beprogrammable integrated circuits such as programmable logic devices thatcontain blocks of programmable logic. The resources of the blocks ofprogrammable logic or other such circuitry may be configured toimplement processing circuitry. The processing circuitry may be used toimplement data processing algorithms. In performing the data processingalgorithms, the processing circuitry may perform read and writeoperations on data in the memory circuitry.

The data may be stored in the form of individually addressable databytes. The data bytes may be stored in rows and columns of data bytelocations in a memory array. Multiple adjacent data bytes in the arraymay be written and read in a single clock cycle. To avoid collisions,the memory array may be partitioned into blocks and each of the adjacentdata bytes may be accessed using a different respective memory blockwithin the memory array. Each such memory block may have its ownassociated data register and its own associated address decoder. Eachaddress decoder may receive address signals from an associatedmultiplexer. Address mapping circuits may be used to distribute addresssignals to multiplexer inputs using a non-blocking memory architecture.The memory architecture allows groups of data bytes to be written andread from the memory array using both column-wise and row-wise read andwrite operations. For example, multiple bytes of data may be writteninto adjacent locations in the memory array in a column-wise fashion ina single clock cycle. In a different clock cycle, a different set ofdata bytes may be read from adjacent locations in the memory array in arow-wise fashion (as an example).

Further features of the invention, its nature and various advantageswill be more apparent from the accompanying drawings and the followingdetailed description of the preferred embodiments.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram of an illustrative integrated circuit such as aprogrammable integrated circuit containing memory in accordance with anembodiment of the present invention.

FIG. 2 is a diagram of a conventional memory array in which data isaccessed in 32-bit words.

FIG. 3 is a diagram of a memory array that has been partitioned intomultiple subarrays to support the individual accessing of subwords ofdata in accordance with an embodiment of the present invention.

FIG. 4 is a diagram of illustrative address mapping circuitry that maybe used in addressing the memory array circuitry of FIG. 3 in accordancewith an embodiment of the present invention.

FIG. 5 is a diagram of an illustrative memory partitioning scheme thatmay be used to ensure that certain simultaneous adjacent row-wise andcolumn-wise memory subword access operations may be performedsatisfactorily in accordance with an embodiment of the present.

DETAILED DESCRIPTION

The present invention relates to memory and processing circuitry thatmay be used in implementing algorithms in which data is read in memoryarray rows and written in memory array columns or vice versa. Forexample, the circuitry may be used in corner turning algorithms and thelike. These algorithms typically require the manipulation of multipleindependent subwords of data (e.g., data in eight-bit bytes) and can becomputationally expensive to implement in conventional memory arrays inwhich data is handled in large data words (e.g., 32-bit data words).

The circuitry of the present invention may be used in any suitableintegrated circuits, such as application-specific integrated circuits,electrically programmable and mask-programmable programmable logicdevice integrated circuits, digital signal processors, microprocessors,microcontrollers, and memory chips. If desired, the circuitry of thepresent invention may be used in programmable integrated circuits thatare not traditionally referred to as programmable logic devices such asmicroprocessors containing programmable logic, digital signal processorscontaining programmable logic, custom integrated circuits containingregions of programmable logic, or other programmable integrated circuitsthat contain programmable logic and one or more memory arrays.

The present invention is sometimes described herein in connection withmemory arrays and associated circuitry on programmable integratedcircuits such as programmable logic device integrated circuits. This is,however, merely illustrative. Memory circuitry in accordance with theinvention may be used on any suitable integrated circuit if desired.

An illustrative integrated circuit device 10 such as a programmablelogic device or other programmable integrated circuit in accordance withthe present invention is shown in FIG. 1.

Device 10 may have input/output circuitry 12 for driving signals off ofdevice 10 and for receiving signals from other devices via input/outputpins 14. Interconnection resources 16 such as global and local verticaland horizontal conductive lines and busses may be used to route signalson device 10. Interconnection resources 16 include conductive lines andprogrammable connections between respective conductive lines and aretherefore sometimes referred to as programmable interconnects 16.

Device 10 may contain programmable logic 18 and memory blocks (arrays)22.

Programmable logic 18 may include combinational and sequential logiccircuitry. The programmable logic 18 may be configured to perform acustom logic function. The programmable interconnects 16 may beconsidered to be a type of programmable logic 18.

As shown in FIG. 1, device 10 may contain programmable memory elements20. Memory elements 20 can be loaded with configuration data (alsocalled programming data) using pins 14 and input/output circuitry 12.Once loaded, the memory elements can each provide a corresponding staticcontrol output signal that controls the state of an associated logiccomponent in programmable logic 18 such as a logic component formed fromone or more metal-oxide-semiconductor transistors. The static controloutput signals may, for example, be provided to the gates ofmetal-oxide-semiconductor transistors to turn the transistors on and offto configure logic 18 as desired.

Memory elements 20 may use any suitable volatile and/or non-volatilememory structures such as random-access-memory (RAM) cells, fuses,antifuses, registers, programmable read-only-memory memory cells,mask-programmed and laser-programmed structures, etc. Because memoryelements 20 are loaded with configuration data during programming,memory elements 20 are sometimes referred to as configuration memory orconfiguration RAM. Mask-programmed programmable logic devices, which aresometimes referred to as structured application specific integratedcircuits, are programmed by using lithographic masks to create a custompattern of connections in an array of vias based on configuration data.

Memory arrays 22 may contain rows and columns of volatile memoryelements such as random-access-memory (RAM) cells. The memory arrays 22may be used to store data signals during normal operation of device 10.For example, memory arrays 22 may be used to store data that is beingreceived and processed as part of a wireless communications channel,data that is associated with an image file, or any other suitable data.If desired, software code may be loaded onto memory arrays 22 andexecuted by processing circuitry on device 10 (e.g., hardwiredprocessing circuitry and processing circuitry implemented using theresources available in programmable logic 18).

The memory arrays 22 on a given device 10 need not all be the same size.For example, small, medium, and large memory arrays 22 may be includedon the same programmable logic device (or other integrated circuit).There may, for example, be hundreds of small memory arrays each having acapacity of about 512 bits, 2-9 large memory arrays each having acapacity of about half of a megabit, and an intermediate number ofmedium size memory arrays each having a capacity of about 4 kilobits.These are merely illustrative memory array sizes and quantities. Ingeneral, there may be any suitable size and number of memory arrays 22on device 10. There may also be any suitable number of regions ofprogrammable logic 18.

The circuitry of device 10 may be organized using any suitablearchitecture. As an example, the logic of programmable logic device 10may be organized in a series of rows and columns of larger programmablelogic regions each of which contains multiple smaller logic regions. Theresources of device 10 such as programmable logic 18 and memory 22 maybe interconnected by programmable interconnects 16. Interconnects 16generally include vertical and horizontal conductors. These conductorsmay include global conductive lines that span substantially all ofdevice 10, fractional lines such as half-lines or quarter lines thatspan part of device 10, staggered lines of a particular length (e.g.,sufficient to interconnect several logic areas), smaller local lines, orany other suitable interconnection resource arrangement. If desired, thelogic of device 10 may be arranged in more levels or layers in whichmultiple large regions are interconnected to form still larger portionsof logic. Still other device arrangements may use logic that is notarranged in rows and columns.

In addition to the relatively large blocks of programmable logic thatare shown in FIG. 1, the device 10 generally also includes someprogrammable logic associated with the programmable interconnects,memory, and input-output circuitry on device 10. For example,input-output circuitry 12 may contain programmable input and outputbuffers. Interconnects 16 may be programmed to route signals to adesired destination.

In accordance with the present invention, an integrated circuit (e.g., aprogrammable integrated circuit or other integrated circuit) may containmemory circuitry (e.g., memory 22 of FIG. 1) that is configured tosupport data processing algorithms in which multiple subwords (bytes) ofdata are manipulated in parallel. With circuits in accordance with thepresent invention, a memory array may be partitioned into multipleblocks. Each block of the memory array may be provided with acorresponding individually-controlled address decoder. Address mappingcircuitry may be used to provide address signals to the address decodersfor the partitioned array. The address mapping circuitry may be used toimplement a memory address tiling pattern that allows multiple adjacentsubwords of data in the array to be accessed in both row-wise andcolumn-wise arrangements without stalling the memory. This is notpossible in conventional memory arrays in which data is manipulated inrelatively large words.

Consider, as an example, the conventional memory circuitry of FIG. 2. Asshown in FIG. 2, memory circuitry 26 may include a memory array 28 thatis arranged in 16 columns 30, each containing 32 memory cells. With thistype of configuration, the memory cells of array 28 may store 512 bitsof data. Encoded address signals may be supplied to address decoder 34over path 40. Four bits of encoded address may be used to uniquelyidentify which of the 16 columns 30 of array 28 is to be accessed.Address decoder 34 can decode the encoded address that is presented oninput 40 and can provide a corresponding unencoded (decoded) version ofthe address on output lines 36.

Lines 36, which are sometimes referred to as address lines or wordlines, may be used to determine which of the columns of memory cells inarray 28 are being accessed. Each of lines 36 may be associated with acorresponding address signal (AD0, AD1, AD2, . . . AD15). When it isdesired to access a particular column in array 28 for reading orwriting, the address that is associated with that column may beasserted, while deasserting the addresses associated with the remainingcolumns in array 28. For example, if it is desired to access the thirdcolumn from the left in array 28, address signal AD2 may be asserted(e.g., taken to a logic high value) while address signals AD0, AD1, AD3,AD4, . . . AD15 are deasserted (e.g., taken to a logic low value). Whensignal AD2 is asserted in this way, data may be written into the thirdcolumn from the left in array 28 from data registers 32 over bit lines(data lines) 38 or data may be read from the third column in array 28into data registers 32 over bits lines 38. Data register circuitry 32may be connected to other circuitry on an integrated circuit such asprocessing circuitry.

In a typical arrangement, memory circuitry 26 of FIG. 2 is used in asystem in which data is processed in 32-bit words. Each 32-bit word maybe made up of four eight-bit bytes of data. In some scenarios, anapplication may process data from array 28 using all four bytes from agiven column at once. However, not all applications process data in thisway. In particular, some data processing algorithms may need to processdata on the subword level (i.e., as individual bytes, rather than infour-byte words). It may be necessary, for example, to process the firstbyte in the column associated with address AD2, the first byte in thecolumn associated with address AD3, the first byte in the columnassociated with address AD4, and the first byte in the column associatedwith AD5, rather than the four bytes associated with a particularcolumn. Accessing these bytes of data in array 28 can be cumbersome,because each column of data must be accessed in its entirety using aseparate clock cycle, even though only a portion of the data associatedwith each column is required. Data may then need to be manipulated usingshifter circuits. As a result of these inefficiencies, conventionalmemory array circuits such as circuit 26 of FIG. 2 may be unsuitable forimplementing many data processing algorithms.

Memory circuitry in accordance with embodiments of the present inventioncan overcome these shortcomings of conventional memory arrays byproviding the ability to independently access multiple subwords of datain a single clock cycle. This may be accomplished by partitioning amemory array into multiple memory blocks and providing each memory blockwith associated address decoder circuitry. Address mapping circuitry maybe used to support both row-wise and column-wise access to adjacentsubwords in the array without collisions.

An illustrative memory array using a memory architecture in accordancewith the present invention is shown in FIG. 3. As shown in FIG. 3,memory circuitry 22 may be organized in multiple banks of memory 42.Each memory array 42 may, for example, represent a subset of aconventional memory array such as memory array 28 of FIG. 2. There may,in general, be any suitable number of memory banks in a given memory 22.In the example of FIG. 3, memory 22 has been organized in four banks ofmemory 42. These four memory banks 42 are labeled “memory bank A,”“memory bank B,” “memory bank C,” and “memory bank D” in FIG. 3. This ismerely an example. A given memory array may be divided into any suitablenumber of banks (e.g., more than four banks or fewer than four banks).An arrangement such as the arrangement of FIG. 3 will allow fouradjacent subwords to be accessed in a given clock cycle in either arow-wise or column-wise orientation. In a memory 22 with a larger numberof memory banks 42 (e.g., six memory banks), a correspondingly largernumber of adjacent subwords could be accessed (e.g., six adjacentsubwords).

Each memory bank 42 may have a corresponding set of bit lines 44. Duringwriting operations, data may be loaded into memory banks 42 fromassociated data register circuits 46 over associated bit lines 44.During data reading operations, data may be read from memory banks 42and may be passed to associated circuitry such as data register circuits46 over bit lines 44. There are eight bit lines in the set of bit lines44 associated with each memory bank in the FIG. 3 arrangement. Forexample, a first set of eight bit lines 44 is used to interconnectmemory bank A with data register circuitry A. Similarly, second, third,and fourth sets of eight bit lines each are used in interconnectingmemory banks B, C, and D with respective data register circuits B, C,and D.

Each memory bank 42 may have an associated address decoder 48. Addressdecoder A may be used to provide address signals to memory bank A,address decoder B may be used to provide address signals to memory bankB, address decoder C may be used to provide address signals to memorybank C, and address decoder D may be used to provide address signals tomemory bank D.

Address decoders 48 may have inputs 54 at which encoded versions of theaddress signals are received. Address decoders 48 may decode theseencoded address signals to produce corresponding decoded versions of theaddress signals on address lines 50. Address lines 50 convey theseaddress signals to banks 42 to provide addressing when accessing thedata in the memory cells of banks 42. In the FIG. 3 example, each memorybank 42 has 16 associated columns of memory cells, which can be uniquelyaddressed using the four-bit address provided to the address input 54for that memory bank. This is merely illustrative. Memory banks 42 may,in general, have any suitable number of memory cells and may be addressusing any suitable number of address lines.

Each column of the FIG. 3 memory banks contains eight memory cells andis used in storing a respective byte of data. For example, memory bank Acontains 16 columns of memory cells and each of these 16 columnscontains eight memory cells that store a byte of data that can beaccessed using the eight respective bit lines 44 that are associatedwith memory bank A.

The address signals that are provided to address inputs 54 may beproduced by address mapping circuitry connected to the inputs ofmultiplexers 52. In an arrangement of the type shown in FIG. 3 in whichthere are four memory banks 42 and four corresponding address decoders48, there may be four associated multiplexers 52. Each multiplexer 52may have multiple inputs 56. Each input 56 for a given multiplexer 52may have multiple address lines and a corresponding control line. Forexample, the first (topmost) input 56 to multiplexer A may have fouraddress lines that carry four-bit address signal A0 and a control inputthat carries control signal SAO. The second input 56 to multiplexer Amay have four address lines that carry four-bit address signal A1 and acontrol input that carries control signal SA1. The third and fourthinputs to multiplexer A may be configured similarly. The third input mayreceive signals A2 and SA2 and the fourth input may receive signals A3and SA3. Multiplexers B, C, and D may receive the same types of addressand control signals.

The control (selection) signals that are applied to each multiplexerinput dictate which address signals for that multiplexer are passed tothe multiplexer output. To ensure that there are no collisions betweenaddress signals, the control signals for each multiplexer may be encodedusing a one-hot encoding scheme. With a one-hot encoding scheme, onlyone of the control signals is asserted (e.g., taken to a logic highvalue), while all remaining control signals are deasserted (e.g., takento a logic low value).

Consider, as an example, the control signals SA0, SA1, SA2, and SA3 thatare applied to the control inputs of multiplexer A. If a given one ofthese control signals is asserted, its associated address signals willbe passed to the output of multiplexer A on the four lines that make upthe address path 54 between multiplexer A and address decoder A. Forexample, if signal SAO is taken high, the signal A0 will be routed fromthe first input of multiplexer A to the output of multiplexer A.Similarly, if signal SA1 is taken high, multiplexer A will route addresssignal A1 to the output of multiplexer A.

Using a one-hot encoding scheme, the control signals SA0, SA1, SA2, andSA3 never contain more than a single logic high value at a given time.For example, when asserting SA2 to route signal A2 to the output ofmultiplexer A, signals SA0, SA1, and SA3 may all be taken low. Duringoperation, each multiplexer in memory 22 receives a respective set ofone-hot encoded control signals. Multiplexer A receives one-hot encodedcontrol signals SA0, SA1, SA2, and SA3, multiplexer B receives one-hotencoded control signals SB0, SB1, SB2, and SB3, multiplexer C receivesone-hot encoded control signals SC0, SC1, SC2, and SC3, and multiplexerD receives one-hot encoded control signals SD0, SD1, SD2, and SD3.

In any given memory access operation (reading or writing), data may beread from or written to each of memory banks A, B, C, and D in a singleclock cycle by supplying appropriate address signals and addressselection control signals to inputs 56 of multiplexers A, B, C, and D.This allows subwords to be read or written to memory banks A, B, C, andD in various patterns. In accordance with the present invention, a tiledmemory architecture is preferably used that prevents access operationsfor different ports from clashing.

The address mapping functionality required to preventing subword memoryaccess operations in memory 22 from clashing may be embedded in thecircuitry of address mapping circuits that produce the addresses andaddress control signals for the inputs of multiplexers 52. Illustrativeaddress mapping circuitry 58 that may be used to generate the addressand control signals for memory 22 of FIG. 3 is shown in FIG. 4. As shownin FIG. 4, address mapping circuitry 58 may include multiple addressmapping circuits 60. In the example of FIG. 4, address mapping circuitry58 (which may be considered to form part of memory circuitry 22 of FIG.3) includes four address mapping circuits AMC0, AMC1, AMC2, and AMC3that are used in producing address signals and associated controlsignals for multiplexers 52 and address decoders 48 of FIG. 3.

Each of the address mapping circuits 60 receives an address signal onits input 62 and produces corresponding address and control signals onits outputs 56. For example, in response to address signals supplied toits input 62, address mapping circuit AMC0 may produce address signalsA0 and associated control signal SAO on a first output 56, may produceaddress signals B0 and associated control signal SB0 on a second output56, may produce address signals C0 and associated control signal SC0 ona third output 56, and may produce address signals D0 and associatedcontrol signal SD0 on a fourth output 56. Signals A0 and SAO arepresented to the first input of multiplexer A (FIG. 3), signals B0 andSB1 are provided to the first input of multiplexer B (FIG. 3), signalsC0 and SC0 are provided to the first input of multiplexer C (FIG. 3),and signals D0 and SD0 are provided to the first input of multiplexer D.

Address mapping circuits AMC1, AMC2, and AMC3 operate similarly. Each ofthese circuits is controlled by address signals provided on acorresponding address signal input 62. Address mapping circuit AMC1provides signals A1 and SA1 to the second input of multiplexer A,provides signals B1 and SB1 to the second input of multiplexer B,provides signals C1 and SC1 to the second input of multiplexer C, andprovides signals D1 and SD1 to the second input of multiplexer D.Address mapping circuit AMC2 provides signals A2 and SA2 to the thirdinput of multiplexer A, provides signals B2 and SB2 to the third inputof multiplexer B, provides signals C2 and SC2 to the third input ofmultiplexer C, and provides signals D2 and SD2 to the third input ofmultiplexer D. Address mapping circuit AMC3 provides signals A3 and SA3to the fourth input of multiplexer A, provides signals B3 and SB3 to thefourth input of multiplexer B, provides signals C3 and SC3 to the fourthinput of multiplexer C, and provides signals D3 and SD3 to the fourthinput of multiplexer D.

The address mapping circuitry associated with memory array circuits ofthe present invention preferably creates address mappings that avoidcollisions when accessing adjacent memory ports in memory 22. In manydata processing algorithms implemented using processing circuitry ondevice 10 it may be desirable to access memory 22 in one dimension(e.g., column-wise) when performing a write operation and in anorthogonal dimension (e.g., row-wise) when performing a read operation.In these operations, subwords (bytes) of data may be accessedindividually, without processing extraneous data in relatively large(e.g., 32 bit) data words.

An arrangement of this type is illustrated in the diagram of FIG. 5. TheFIG. 5 diagram shows an illustrative tiled memory architecture that maybe used for memory 22 that avoids memory port collisions when accessingmultiple adjacent subwords using column-wise and row-wise arrangements.The diagram of FIG. 5 corresponds to a memory array 22 that has 64subwords (bytes) of data storage capacity (as with the memory array ofFIG. 3). In FIG. 5, these 64 subwords of data are arranged in an 8×8array and are associated with 64 separate addresses. For example, thetile (square) in the first row and first column of the array of FIG. 5is labeled “0” because the address “0” is associated with this subword.As another example, the square in the last column and last row of thememory array of FIG. 5 is labeled “63” because the address “63” isassociated with the subword of data stored in this array position.

Although represented as an 8×8 array of subwords, it will be appreciatedthat any suitable physical layout shape may be used for a given memoryarray 22. For example, a 64-byte (512 bit) array may be provided usingmemory cells that are organized in four banks each with 16 columns of 8bits each, as described in connection with the illustrative arrangementof FIG. 3. The use of the 8×8 arrangement of FIG. 5 is merelyillustrative.

The memory locations of the subwords in the array of FIG. 5 are eachassociated with a respective memory bank. For example, the subword ataddress “0” is associated with memory bank A. During read and writeoperations, the subword corresponding to address “0” will be stored inmemory bank A (e.g., in the first column of memory bank A). Similarly,the subword at address “1” is associated with memory bank B, the subwordat address “2” is associated with memory bank C, the subword at address“3” is associated with memory bank D, etc.

The memory architecture of FIG. 5 allows adjacent subwords to be writtento memory 22 and read from memory 22 in both row-wise and column-wiseschemes as needed to efficiently implement various data processingalgorithms (e.g., corner turning algorithms, etc.). Because of thetiling pattern that is used in the array of FIG. 5, adjacent memoryports (i.e., ports associated with each of the inputs 62 in FIG. 4) donot clash.

Consider, as an example, a column-wise write operation involving thefour subwords 64 of FIG. 5. In this scenario, it is desired to writefour subwords (bytes) of data into memory 22: a first subword at address18, a second subword at address 26, a third subword at address 34, and afourth subword at address 42. As indicated by the labels of FIG. 5,these subwords are associated with storage locations in memory banks A,B, C, and D, respectively. Because each subword is written into adifferent memory bank 42 using a different address decoder 48, all fourof the subwords can be written in a single simultaneous column-wisewrite operation (i.e., in one clock cycle). As a result of the patternof FIG. 5, the same is true for any four adjacent subwords in the memoryarray, even if the first subword that is written is not written intomemory bank A. For example, when performing a column-wise writeoperation on subwords 66, the subword associated with address 30 isstored in memory bank B, the subword associated with address 38 isstored in memory bank C, the subword associated with address 46 isstored in memory bank D, and the subword associated with address 54 isstored in memory bank A.

In some data processing algorithms, it may be desirable to perform arow-wise read operation (e.g., after performing a column-wise writeoperation). For example, the four adjacent subwords 68 of FIG. 5 may beread in a row-wise fashion. Because of the tiling scheme used for thememory of FIG. 5, each adjacent subword in this row-wise read operationis read from a different memory bank. In particular, the subword ataddress 3 is read from memory bank D, the subword at address 4 is readfrom memory bank A, the subword at address 5 is read from memory bank B,and the subword at address 6 is read from memory bank C. Each of thesememory banks is different, so the row-wise read operation of adjacentsubwords 68 may be performed in a single clock cycle.

Address mapping circuitry 58 of FIG. 4 may be used in producing theaddress signals for address decoders 48 of FIG. 3. Consider, as anexample, the situation in which subwords 68 of FIG. 5 are being readfrom memory 22. In this example, a first address (e.g., address “3”) isprovided to address mapping circuit AMC0 at its input 62. In response,address mapping circuit AMC0 produces a one-hot encoded control signalin which signal SD0 is high and signals SAO, SB0, and SC0 are low.Address signals associated with the asserted SD0 control signal areprovided on address signal output D0. Because address 3 corresponds tothe first (lowest address) memory location in memory bank D that isbeing used in the array of FIG. 5, address D0 may be, for example, 0000.

At the same time that address D0 is being provided by address mappingcircuit AMC0, address “4” is being provided to the address input 62 ofaddress mapping circuit AMC1, address “5” is being provided to theaddress input 62 of address mapping circuit AMC2, and address “6” isbeing provided to the address mapping circuit AMC3. In response, addressmapping circuit AMC1 produces a high SA1 control signal (andcorresponding address signals A1) and produces low control signals SB1,SC1, and SD1. Because address 4 corresponds to the second memorylocation in memory bank A that is being used in the array of FIG. 5,address A1 may be, for example, 0001. Address mapping circuit AMC2produces a high SB2 control signal (and corresponding address signal B2)and produces low control signals SA2, SC2, and SD2. Because address 5corresponds to the second memory location in memory bank B that is beingused in the array of FIG. 5, address B2 may be, for example, 0001.Address mapping circuit AMC3 produces a high SC3 control signal (andcorresponding address C3) and produces low control signals SA3, SB3, andSD3. Because address 6 corresponds to the second memory location inmemory bank C that is being used in the array of FIG. 5, address C3 maybe, for example, 0001.

When these signals are received by multiplexers 52 of FIG. 3, theasserted control signals configure multiplexers 52 to route appropriateaddress signals from their inputs to their outputs. In particular, theasserted SD0 control signal directs multiplexer D of FIG. 3 to routeaddress D0 from its input to the input 54 of address decoder D, theasserted SA1 control signal directs multiplexer A of FIG. 3 to routeaddress A1 from its input to the input 54 of address decoder A, theasserted SB2 control signal directs multiplexer B of FIG. 3 to routeaddress B2 from its input to the input 54 of address decoder B, and theasserted SC3 control signal directs multiplexer C of FIG. 3 to routeaddress C3 from its input to the input 54 of address decoder C.

This type of scheme may be used for any four adjacent subwords in bothcolumn-wise addressing schemes and row-wise addressing schemes, and inboth write operations and read operations. More than four adjacentsubwords can be handled simultaneously by partitioning memory 22 intomore memory blocks (e.g., memory blocks E, F, etc.) and by providingcorresponding address decoders, multiplexers, and address mappingcircuits. If desired, memory architectures such as the memoryarchitecture of FIG. 5 may be used to support other types ofsimultaneous read and write operations. For example, the tiling schemeof FIG. 5 may be modified so that there are no repetitions when readingmemory locations along diagonals, etc.

The foregoing is merely illustrative of the principles of this inventionand various modifications can be made by those skilled in the artwithout departing from the scope and spirit of the invention.

1. A memory array, comprising: a plurality of memory banks; a pluralityof respective address decoders that produce respective address signalsfor addressing the memory banks; and a plurality of multiplexers each ofwhich has an output that provides a corresponding one of the addresssignals to a corresponding one of the address decoders, wherein each ofthe multiplexers has a plurality of inputs, each input having arespective address input and a respective control input, and whereinwhen a control signal is asserted on the control input of a given one ofthe inputs of a given multiplexer, address information on the addressinput of that given one of the inputs is routed to the output of thegiven multiplexer.
 2. The memory array defined in claim 1 furthercomprising address mapping circuitry that receives a plurality ofaddress signals on respective address mapping circuit inputs and thatprovides corresponding control signals and associated addressinformation to the inputs of the multiplexers.
 3. The memory arraydefined in claim 2 wherein the memory banks contain memory cells thatstore data bytes and wherein the address mapping circuitry,multiplexers, and address decoders are configured to write each of aplurality of the data bytes into a respective one of the memory banks ina single clock cycle.
 4. The memory array defined in claim 2 wherein thememory banks contain memory cells that store data bytes and wherein theaddress mapping circuitry, multiplexers, and address decoders areconfigured to read each of a plurality of the data bytes from arespective one of the memory banks in a single clock cycle.
 5. Thememory array defined in claim 2 wherein the memory banks contain memorycells that store data bytes and wherein the address mapping circuitry,multiplexers, and address decoders are configured to write each of aplurality of the data bytes into a respective one of the memory banks ina first clock cycle and are configured to read each of a plurality ofthe data bytes from a respective one of the memory banks in a secondclock cycle.
 6. The memory array defined in claim 2 wherein the memorybanks contain memory cells that store data subwords and wherein theaddress mapping circuitry, multiplexers, and address decoders areconfigured to perform a column-wise write operation on the memorycircuitry to write each of a plurality of the data subwords into arespective one of the memory banks in a first clock cycle and areconfigured to perform a row-wise read operation on the memory circuitryto read each of a plurality of the data subwords from a respective oneof the memory banks in a second clock cycle.
 7. The memory array definedin claim 2 wherein the memory banks contain memory cells that store datasubwords and wherein the address mapping circuitry, multiplexers, andaddress decoders are configured to perform a row-wise write operation onthe memory circuitry to write each of a plurality of the data subwordsinto a respective one of the memory banks in a first clock cycle and areconfigured to perform a column-wise read operation on the memorycircuitry to read each of a plurality of the data subwords from arespective one of the memory banks in a second clock cycle.
 8. A methodof accessing data in a memory array formed from multiple memory banks,comprising: accessing a plurality of subwords of data from adjacentmemory locations within the memory array in a single clock cycle,wherein the each subword is stored in a different one of the memorybanks.
 9. The method defined in claim 8 wherein accessing the pluralityof subwords comprises writing each of the subwords into a respective oneof the memory banks to perform a column-wise write operation on thememory array.
 10. The method defined in claim 9 further comprising: indifferent single clock cycle, reading each of the subwords from arespective one of the memory banks to perform a row-wise read operationon the memory array.
 11. The method defined in claim 8 wherein accessingthe plurality of subwords comprises writing each of the subwords into arespective one of the memory banks to perform a row-wise write operationon the memory array.
 12. The method defined in claim 11 furthercomprising: in different single clock cycle, reading each of thesubwords from a respective one of the memory banks to perform acolumn-wise read operation on the memory array.
 13. The method definedin claim 8 wherein accessing the plurality of subwords comprises readingeach of the subwords from a respective one of the memory banks using arespective one of a plurality of address decoders to perform acolumn-wise read operation on the memory array.
 14. The method definedin claim 8 wherein accessing the plurality of subwords comprises readingeach of the subwords from a respective one of the memory banks using arespective one of a plurality of address decoders to perform a row-wiseread operation on the memory array.
 15. An integrated circuit,comprising: a memory array partitioned into a plurality of memory banks,wherein the memory array has a plurality of rows and columns of databyte storage locations, each data byte storage location storing a singledata byte; and circuitry for storing a plurality of data bytes in thememory array at respective adjacent data byte storage locations bystoring each of the plurality of data bytes in a respective one of thememory banks, wherein the circuitry is configured to store each of theplurality of data bytes in a different one of the adjacent data bytestorage locations.
 16. An integrated circuit, comprising: a memory arraypartitioned into a plurality of memory banks, wherein the memory arrayhas a plurality of rows and columns of data byte storage locations; andcircuitry for storing a plurality of data bytes in the memory array atrespective adjacent data byte storage locations by storing each of theplurality of data bytes in a respective one of the memory banks, whereinthe circuitry comprises: bit lines; and a plurality of data registercircuits each of which is associated with a respective one of the memorybanks and which is connected to that memory bank by a corresponding setof the bit lines.
 17. The integrated circuit defined in claim 16 furthercomprising: a plurality of address decoders each of which has addresslines that are coupled only to a respective one of the memory banks. 18.The integrated circuit defined in claim 17 further comprising aplurality of multiplexers each of which has a multiplexer outputconnected to an input of a respective one of the address decoders. 19.The integrated circuit defined in claim 18 further comprising aplurality of address mapping circuits each of which provides a pluralityof different address, wherein for each address mapping circuit: each ofthe plurality of different addresses provided by the address mappingcircuit is provided to a different one of the multiplexers.
 20. Anintegrated circuit, comprising: a memory array partitioned into aplurality of memory banks, wherein the memory array has a plurality ofrows and columns of data byte storage locations; circuitry for storing aplurality of data bytes in the memory array at respective adjacent databyte storage locations by storing each of the plurality of data bytes ina respective one of the memory banks; and programmable logic thatimplements processing circuitry that accesses the memory array.